#
Dr. M. Baron, Statistical Machine Learning class, STAT-427/627
# Jackknife and Bootstrap for Traffic Problems
One needs
to estimate p, the frequency of days with 0 traffic accidents on a
certain highway. The data are collected. During 40 days, there are 26 days with 0 accidents, 10 days
with 1 accident, and 4 days with 2 accidents.
Statistician
A estimates p by the sample proportion .
Statistician
B argues that this method does not distinguish between the days with 1 accident
and the days with 2 accidents, losing some valuable information. She suggests to model the number of accidents X by a Poisson distribution with parameter λ.
Then p = P( X = 0 ) = e- λ.
Statistician B estimates λ with . Then
is a new estimator of p, and it uses
all the information contained in the data.
However,
this estimator is biased. Both statisticians use Jackknife to reduce the
bias, and together they obtain the jackknife estimator .
Now we
have three competing estimators - ,
, and
. Evaluate and rank
them according to their bias and standard error?
Solution:
We can
generate the data as follows
> X = c(
rep(0,26), rep(1,10), rep(2,4) )
> X
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
1 1 2 2 2 2
Define the function P.poisson
= exp( - sample mean) and apply Jackknife.
> P.poisson = function( X ){
+ return(
exp(-mean(X)) ) }
> P.poisson(X)
[1] 0.6376282
>
library(bootstrap)
> JK = jackknife(X,P.poisson)
> JK$jack.bias
[1] 0.003683256
> P.JK = P.poisson(X) - JK$jack.bias
> P.JK
[1] 0.6339449
Our three estimates are evaluated as ,
,
and
.
According to the bias… is unbiased;
has bias = 0.0037, and
has a lower bias than
, because
Jackknife reduces bias. Thus, according to the bias, the estimates are ranked
as:
(1)
(2)
(3)
According to the standard deviation…
To use the bootstrap, define three functions for
the sample proportion proposed by Statistician A, the estimator proposed by
Statistician B, and for its jackknife version.
> P.hat = function( X, sample ){
+ return( mean(X[sample]==0)
)}
> P.poisson = function( X, sample ){
+ return(
exp(-mean(X[sample])) )}
> P.JK =
function( X, sample ){
+ n = length(X)
+ P.i = rep(0,n)
+ for (i in
1:n){ P.i[i] = P.poisson(X[sample],-i)
} # Leave one out
+ return( n*P.poisson(X[sample],) - (n-1)*mean(P.i)
)} # Jackknife formula
Now, use bootstrap to evaluate performance
of these estimators from the given data.
>
library(boot)
> boot( X, P.hat, R=10000 )
original
bias std. error
t1* 0.65 -0.0005225 0.07590366
> boot( X, P.poisson, R=10000 )
original bias
std. error
t1* 0.6376282
0.004304599 0.06729368
> boot( X, P.JK,
R=10000 )
original bias
std. error
t1* 0.6339449
0.004085077 0.06780789
We knew that the first estimator is unbiased.
However, it ignores some information, as it is correctly noticed in the
problem. As a result, it has the highest standard error,
among these three competing estimators. The second estimator
has some bias, but it has a lower
standard error. Its jackknife version makes some bias reduction, but it is at
the expense of a little higher standard error.
These results may vary somewhat because they are
based on random bootstrap samples.
Summary
According to the lower bias, the estimators are
ranked as
(1)
(best)
(2)
(3)
(worst)
According to the lower standard deviation (standard
error), the estimators are ranked as
(1)
(best)
(2)
(3)
(worst)